Skip to content

Fix uint overflow bugs in std::{at_vec, vec, str} #9108

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 7 commits into from
Closed

Fix uint overflow bugs in std::{at_vec, vec, str} #9108

wants to merge 7 commits into from

Conversation

bluss
Copy link
Member

@bluss bluss commented Sep 11, 2013

Fix uint overflow bugs in std::{at_vec, vec, str}

Closes #8742

Fix issue #8742, which summarized is: unsafe code in vec and str did assume
that a reservation for X + Y elements always succeeded, and didn't overflow.

Introduce the method Vec::reserve_additional(n) to make it easy to check for
overflow in Vec::push and Vec::push_all.

In std::str, simplify and remove a lot of the unsafe code and use push_str
instead. With improvements to .push_str and the new function
vec::bytes::push_bytes, it looks like this change has either no or positive
impact on performance.

I believe there are many places still where v.reserve(A + B) still can overflow.
This by itself is not an issue unless followed by (unsafe) code that steps aside
boundary checks.

@emberian
Copy link
Member

LGTM. I'm ok with the perf hit.

@bluss
Copy link
Member Author

bluss commented Sep 11, 2013

Most of the removal of unsafe code and simplification is because I don't want to check for overflow everywhere. It's messy, and we can let boundary checks or .reserve() do the work most of the time. However we may need a better plan. What do we really want to do for the v.reserve(X + Y) case. Ignore the overflow in addition and handle it in bounds checks? Change the "size_t" type to something that always uses checked addition?

@erickt
Copy link
Contributor

erickt commented Sep 11, 2013

Bounds checking sounds like a good candidate for a __builtin_expect-style function that hints to the branch predictor to assume we'll almost always not overflow when resizing. Do we have anything like that in the compiler, and if not, how hard would it be to add it?

@emberian
Copy link
Member

More importantly: how much impact would it actually have?

On Wed, Sep 11, 2013 at 6:25 PM, Erick Tryzelaar
notifications@github.comwrote:

Bounds checking sounds like a good candidate for a __builtin_expecthttp://llvm.org/docs/BranchWeightMetadata.html-style
function that hints to the branch predictor to assume we'll almost always
not overflow when resizing. Do we have anything like that in the compiler,
and if not, how hard would it be to add it?


Reply to this email directly or view it on GitHubhttps://github.com//pull/9108#issuecomment-24281515
.

@emberian
Copy link
Member

+1 to branch prediction hints; conversation that convinced me at https://botbot.me/mozilla/rust/msg/5945880/

@brson
Copy link
Contributor

brson commented Sep 12, 2013

I'm very skeptical of branch prediction hints generally. It's not uncommon to annotate them incorrectly, or for them to be useless.

@thestinger
Copy link
Contributor

I have a feeling that LLVM's branch prediction hints don't do much when you aren't optimizing for size.

@bluss
Copy link
Member Author

bluss commented Sep 13, 2013

it's not an important part. More important would be to reduce the number of places we need to check for overflow

Like next_power_of_two, but returns None on overflow.
@bluss
Copy link
Member Author

bluss commented Sep 16, 2013

Rebased to avoid conflicts (new name of Option::unwrap_or).

I changed .reserve_additional to check the vec capacity before doing anything; I think this is an improvement that puts the failure case (in this case overflow) inside a branch that is only taken seldomly.

I changed vec::bytes::push_bytes to not use .mut_slice() but to use ptr::copy_memory directly. As the commit log says, It's good to avoid overhead and use a tad more unsafe code since much of str will depend on this function.

With these changes, it looks like the StrVector method .connect() is faster with its safe implementation on top of the change series, than its unsafe implementation before.

I'm not sure if I'm benchmarking this correctly. It could be that .connect() just optimizes to the point of invalidating the benchmark with the changes, but that's a win in itself..

with local rustc --opt-level=3 -Z no-monomorphic-collapse --test std.rs

before
test str::bench::bench_connect ... bench: 333 ns/iter (+/- 2)
test str::bench::bench_push_str ... bench: 163 ns/iter (+/- 4)

after
test str::bench::bench_connect ... bench: 178 ns/iter (+/- 3)
test str::bench::bench_push_str ... bench: 162 ns/iter (+/- 2)

I'm not sure if the benchmarks should stay in the PR, but apart from that I think this change is ready for review & merge.

@bluss
Copy link
Member Author

bluss commented Sep 16, 2013

Regarding branch predicition, since the overflow check is now inside a rarely taken if, it's less of an issue. I'd guess that using CheckedAdd we give enough information already to llvm to adjust branch prediction.

@bluss
Copy link
Member Author

bluss commented Sep 16, 2013

Updated the PR text (because this information will show in the merge message, so I want it to be correct)

@bluss
Copy link
Member Author

bluss commented Sep 16, 2013

Now the stage2 bench is done for before and after:

before:
test str::bench::bench_connect ... bench: 302 ns/iter (+/- 7)
test str::bench::bench_push_str ... bench: 140 ns/iter (+/- 3)

after
test str::bench::bench_connect ... bench: 225 ns/iter (+/- 24)
test str::bench::bench_push_str ... bench: 139 ns/iter (+/- 2)

blake2-ppc added 3 commits September 16, 2013 19:13
Issue #8742

Add the method `.reserve_additional(n: uint)`: Check for overflow in
self.len() + n, and reserve that many elements (rounded up to next power
of two). Does nothing if self.len() + n < self.capacity() already.
`push_bytes` is implemented with `ptr::copy_memory` here since this
function is intended to be used to implement `.push_str()` for str, so
we want to avoid the overhead.
/// Access the str in its vector representation.
/// The caller must preserve the valid UTF-8 property when modifying.
#[inline]
pub unsafe fn as_owned_vec<U>(s: &mut ~str, f: &fn(&mut ~[u8]) -> U) -> U {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be expressed as fn as_owned_vec<'a>(s: &'a mut ~str) -> &'a mut ~[u8] ?

@alexcrichton
Copy link
Member

This looks good to me! Sad to see a positive diff stat, but I guess that's what happens when you write tests :)

r+ with my comment

blake2-ppc added 3 commits September 17, 2013 02:47
Easy to reproduce:

    let mut v = ~[@1];
    v.resize(-1);  // success a.k.a silent failure
    v.push(@2); // segfault
@bluss
Copy link
Member Author

bluss commented Sep 17, 2013

Incorporated your suggestion, also found one place where as_owned_vec could be used (as_mut_buf), so those are the changes squashed into the std::str: Fix overflow.. commit.

bors added a commit that referenced this pull request Sep 17, 2013
…hton

Fix uint overflow bugs in std::{at_vec, vec, str}

Closes #8742

Fix issue #8742, which summarized is: unsafe code in vec and str did assume
that a reservation for `X + Y` elements always succeeded, and didn't overflow.

Introduce the method `Vec::reserve_additional(n)` to make it easy to check for
overflow in `Vec::push` and `Vec::push_all`.

In std::str, simplify and remove a lot of the unsafe code and use `push_str`
instead. With improvements to `.push_str` and the new function
`vec::bytes::push_bytes`, it looks like this change has either no or positive
impact on performance.

I believe there are many places still where `v.reserve(A + B)` still can overflow.
This by itself is not an issue unless followed by (unsafe) code that steps aside
boundary checks.
@bors bors closed this Sep 17, 2013
flip1995 pushed a commit to flip1995/rust that referenced this pull request Jul 18, 2022
…xFrednet

Remove trailing spaces

Closes rust-lang#9108
changelog: remove trailing spaces,  which are not allowed by the JSON standard.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Hazards on uint overflow in std::vec
7 participants